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ABSTRACT 

Visual rendering of graphs is a key task in the mapping of complex 
network data. Although most graph drawing algorithms emphasize 
aesthetic appeal, certain applications such as travel-time maps place 
more importance on visualization of structural network properties. 
The present paper advocates a graph embedding approach with cen- 
trality considerations to comply with node hierarchy. The problem is 
formulated as one of constrained multi-dimensional scaling (MDS), 
and it is solved via block coordinate descent iterations with succes- 
sive approximations and guaranteed convergence to a KKT point. 
In addition, a regularization term enforcing graph smoothness is in- 
corporated with the goal of reducing edge crossings. Experimental 
results demonstrate that the algorithm converges, and can be used to 
efficiently embed large graphs on the order of thousands of nodes. 
Index Terms — MDS, graph embedding, coordinate descent. 

1. INTRODUCTION 

Graphs offer a valuable means of encoding relational information 
between entities of complex systems, arising in modem communica- 
tions, transportation and social networks, among others. Despite the 
abundance of network analysis techniques, information visualization 
is a powerful tool for capturing patterns that may not be apparent in 
large-scale systems. However, most visualization algorithms focus 
more on aesthetic appeal than the structural characteristics of the 
underlying data. Such network structure is captured through graph- 
theoretic notions such as node centrality and network cohesion. 

The present paper deals with embedding graphs for visualiza- 
tion while adhering to the underlying node centrality structure. Cen- 
trality measures capture the relative importance of network nodes 
among their peers. Betweenness centrality for instance, describes 
the extent to which information is routed through a specific node by 
measuring the fraction of all shortest paths traversing this node; see 
s.g., (U p. 89]. Other measures include closeness, eigenvalue, and 
degree centrality. To incorporate centrality using any of these met- 
rics, an MDS (so-termed stress Chap. 3]) criterion is adopted, 
under radial constraints that place nodes of higher centrality closer 
to the origin of the graph embedding. MDS seeks a low-dimensional 
depiction of high-dimensional data in which pairwise Euclidean dis- 
tances between embedding coordinates are close (in a well-defined 
sense) to the dissimilarities between the original data points. Close- 
ness criteria (a.k.a. stress costs) are generally non-convex, and the 
quest for global optimality is challenging because ordinary descent 
methods do not have optimality guarantees, and are sensitive to ini- 
tialization. Successive approximation with global and convex upper 
bounds is used in 1 2 , Chap. 8] to minimize the stress cost yielding 
near-optimal results. 

The novel approach exploits the block separability inherent to 
the proposed model and adapts the coordinate descent algorithm to 
determine the optimal embedding. Edge crossings are minimized by 



regularizing the cost with a smoothness promoting term weighted by 
a tuning parameter. Smoothness encourages nodes that share an edge 
to lie closer to each other in the embedding. As a result, the length 
and hence the number of edge crossings in the network visualiza- 
tion is markedly reduced. In addition, the regularization term offers 
the benefit of incorporating the underlying network topology when 
the dissimilarities considered are not graph-theoretic e.g., Euclidean 
distances between feature vectors associated with each node. More- 
over, numerical tests illustrate that judicious selection of the tuning 
parameter results in fewer block coordinate descent iterations, which 
in turn yields a visually appealing embedding. 

To place the present work in context, a prior approach itera- 
tively minimizes a weighted stress function with iteration-dependent 
weights chosen to incoiporate radial constraints [3 |. However, it is 
limited to graph-theoretic dissimilarities, and offers no convergence 
guarantees. A heuristic algorithm for network visualization uses the 
fc-core decomposition to hierarchically place nodes within "onion- 
like" concentric shells |4|. Although effective for large-scale net- 
works, it has no optimality associated with it, and is limited to visu- 
alization only in 2 dimensions. The proposed approach scales well 
for large networks under a well-defined optimality criterion with a 
convergence guarantee. 

2. MODEL AND PROBLEM STATEMENT 

Consider a network represented by an undirected graph Q = (V, £), 
where £ denotes the set of edges, and V the set of vertices with 
cardinality |V| = A'^. Let 5ij denote the pairwise dissimilarity (edge 
weight) between any two nodes i and j. Given the set and 
the prescribed embedding dimension p, the graph embedding task 
amounts to finding p x 1 vectors {x^j^j so that the embedding 
coordinates Xi and Xj satisfy ||xi — Xj ||2 ~ Sij- 

With^ij = it suffices to know {{(5ij}^i}^j^i, or, be pos- 
sible to determine them from Q. Most visualization schemes assign 
Sij to the shortest path distance between nodes i and j. In this work, 
the Euclidean commute-time distance (ECTD) is adopted because 
it decreases as the number of shortest paths between node pairs in- 
creases |5 |. This is more reasonable since having multiple shortest 
paths between node pairs endows them with a higher level of acces- 
sibility by e.g., a random walker on the graph. 

MDS amounts to solving the following problem: 

^ N N 

(PO) {xi}^i = argmin -^^[||xi -XjIIj (1) 
xi,...,x„ ^ .^-^ j.^i 

Turning attention to node centralities {cijfLi, those can be ob- 
tained using a number of algorithms |I Chap. 4]. Centrality struc- 
ture will be imposed on ([T} by constraining x^ to have a centrality- 
dependent radial distance f{ci), where /(.) is a monotone decreas- 
ing function. The resulting constrained optimization problem now 



becomes 

JV N N 

(PI) {xili^i = argmin ^ E E - - 5^J 

XI ,. ..,xjY i— 1 J — 1 



s. to 



,iV. (2) 



Although PO is non-convex, standard solvers rely on gradient de- 
scent iterations but have no guarantees of convergence to the global 
optima |6|. Lack of convexity is exacerbated in PI by the non- 
convex constraint set rendering its solution even more challenging 
than that of PO. However, considering a single embedding vec- 
tor Xi, and fixing the rest {xjjj^^i, the constraint set simplifies to 
||xi||2 = /(ci), for which an appropriate relaxation can be sought. 
Key to the algorithm proposed next lies in this inherent decoupling 
of the centrality constraints. 

3. BCD WITH SUCCESSIVE APPROXIMATIONS 

By exploiting the separable nature of the cost as well as the norm 
constraints in block coordinate descent (BCD) will be adopted 
in this section to aiTive at a solution approaching the global opti- 
mum. To this end, the centering constraint Eili ~ 0' typically 
invoked to fix the inherent translation ambiguity, will be dropped 
first so that the problem remains decoupled across nodes. The effect 
of this relaxation can be compensated for by computing the centroid 
of the solution of ([2}, and subtracting it from each coordinate. The 
equality norm constraints are also relaxed to ||xi||2 < f(ci). 
Although the entire constraint set is non-convex, each relaxed con- 
straint is a convex and closed Euclidean ball with respect to each 
node in the network. 

Let x[ denote the minimizer of the optimization problem over 
block i, when the remaining blocks {xj}^._^j are fixed during the 
BCD iteration r. By fixing the blocks {xj}^_^^ to their values from 
the most recent iterations, the sought embedding is obtained as 

^ JV JV 

{xi}^i = argmin - ^ih ^ ^^jT (3) 



or equivalently as 



arg mm 

{x,:||x,||2</(ci)} 



(^||x,:|li-xf(^X,^ + ^xn 



■^(5ij||Xi -Xj||2 -^(5ij||Xi -Xj ^||2 (4) 



i-1 N 

where E(-) := E (•) and E(-) E (•)• With the last two 

sums in Q being non-convex and non-smooth, convergence of the 
BCD algorithm cannot be guaranteed |7 p. 272]. Moreover, it is 
desired to have each per-iteration subproblem solvable to global op- 
timality, in closed form and at a minimum computational cost. The 
proposed approach seeks a global upper bound of the objective with 
the desirable properties of smoothness and convexity. To this end, 
consider the function ^'(xi) := tpi (x^) — 7/;2(xi), where 



V'l(Xi) 



|xi||2 -xf(^x; + ^x; ^) (5) 



and 



^2(Xi) — ^(5ij||Xi -X^||2 +^(5ij||Xi -X^ ^||2. (6) 



Note that i/ii (xi) is a convex quadratic function, and that -02 (x,;) 
is convex (with respect to Xi) but non-differentiable. The first-order 
approximation of (|6} at any point in its domain is a global under- 
estimate of ^2(xi). Despite the non- smoothness at some points, 
such a lower bound can always be established using its subdifferen- 
tial. As a consequence of the convexity of 7/)2(xi), it holds that I?) 
p. 731] 

V'2(x) > ■i/'2(xo) + g^(xo)(x - xo), Vx G dom(V'2) (7) 

where g(x) G 9'i/)2(x) is a subgradient within the subdifferential set, 
811)2 (x) of i/)2 (x). The subdifferential of ||xi — Xj ||2 with respect to 
Xi is given by 



l9xi ||X: 

which implies that 



X.,- 2 = 



ye: 



if X, / Xj 
IIyIIz < 1, otherwise 



9xi'!/'2(x,) = ^5,j9xJlXi -Xj|l2. 

3 = 1 



(8) 



(9) 



Using ([TJ, it is possible to lower bound ([SJl by 

1p'2{x^,Xo) = ^<5ij [||X0 -Xj:||2 + (gj)^(xo)(Xi - Xq)] 

+ J2S^3 [l|xo - x^'lh + (gr')^(xo)(x, - xo)] . (10) 



3>i 



Consider now <I>(xi,xo) ~ «/)i(xi) — i/)2(xi,xo), and note that 
$(xi, Xo) is convex and upper bounds globally the cost in (|4j. The 
proposed BCD algorithm involves successive approximations using 
dlOb . and yields the following QCQP for each block 



(P2) argmin $(xi,xo). 

{xi:||xi||2</(c,)} 



(11) 



For convergence, xq must be selected to satisfy the following condi- 
tions (U: 

$(xo,xo) = *(xo), VxoGC,Vi (12a) 
$(x„xo) > *(xO, ||x,||2 < /(cO,Vi (12b) 

where C := U^i {^^ ■ W^ih < /(ci)}- I" addition, $(xi,xo) 
must be continuous in (xi,xo). Upon selecting xq — x^^^, the 
iterate x[~^ satisfies l |12at and ( |12b| (. Taking successive approxima- 
tions around x^^'^ in P2, ensures the uniqueness of 

r ■ (JV-1) T 

Xi = arg mm ^ ^ ^ x^ Xi 

{x.:||xdl2</(ci)} 

"xnE(x;+5.,g;(xr^)) 

3<i 

+ E(xr'+5«.gr' «"'))] (13) 

3>t 

Solving ( 113b amounts to obtaining the solution of the unconstrained 
QP, (x*)'^, and projecting it onto {x; : HxiHj < /(ci)}' that is. 



II{X')'-||2 

(x*)'', otherwise 



/(co, if it(x:)ni2 > /(co 



(14) 



where 

(x*r =iv^[E(x; + '5».sI«~')) 

+E(xr'+'5..gr' (15) 

It is desirable but not necessary that the algorithm converges because 
depending on the application, reasonable network visualizations can 
be found with fewer iterations. In fact, successive approximations 
merely provide a more refined graph embedding that maybe more 
aesthetically appealing. 

Although the proposed algorithm is guaranteed to converge, the 
solution is only unique up to a rotation and a translation (cf. MDS). 
In order to eliminate the translational ambiguity, the embedding can 
be centered at the origin. Assuming that the optimal blocks deter- 
mined within outer iteration r are reassembled into the embedding 
matrix X'' ~ [(x^)"^, . . . , (xj/)"^] , the final step involves sub- 
tracting the mean from each coordinate using the centering operator 
as follows, X = (I - 7V-^11^)X'", where I denotes the iV x TV 
identity matrix, and 1 is the A'^ x 1 vector of all ones. 

The novel graph embedding scheme is summarized as Algo- 
rithm [T] with matrix A having (i,j)th entry the dissimilarity 5ij. 



Algorithm 1 BCD algorithm with successive approximations 

Input: {c.}ti, A,e 
Initialize X , r = 
repeat 

r = r + 1 

for j = 1 ... TV do 

Compute x[ according to ( I14t and dlSt 
X'-(»,:) = (xr)^ 

end for 
until IIX'' -X''~1||f < e 
X = (I-^11^)X'- 



4. ENFORCING GRAPH SMOOTHNESS 

In this section, the MDS stress in ^ is regularized through an ad- 
ditional constraint that encourages smoothness over the graph. In- 
tuitively, despite the requirement that the node placement in low- 
dimensional Euclidean space respects inherent network structure, 
through preserving e.g., node centralities, neighboring nodes in a 
graph-theoretic sense (meaning nodes that share an edge) are ex- 
pected to be close in Euclidean distance within the embedding. Such 
a requirement can be captured by incorporating a constraint that dis- 
courages large distances between neighboring nodes. In essence, this 
constraint enforces smoothness over the graph embedding. 

A popular choice of a smoothness-promoting function is /i(X) : = 
Tr(X"^LX), where Tr(.) denotes the trace operator, and L :— D — A 
is the graph Laplacian with D a diagonal matrix whose {i, i)th entry 
is the degree of node i, and A the adjacency matrix. It can be shown 

JV JV 

that /i(X) = (1/2) 51^ 51! fflijllxi — Xjlll, where aij is the (i, j)th 

1=1 1=1 

entry of A. Motivated by penalty methods in optimization, the cost 



in ^ will be augmented as follows 

(P3) argmin f E E [ll^i - x^Hj - 5,j\ 

xi,...,XN i=lj = l 

JV JV 

+ 7 E E - Xjlli 

i=l j=l 

s. to |lx,||2 = /(c,),j = l,...,TV (16) 

where the scalar A > controls the degree of smoothness. The 
penalty term has a separable structure and is convex with respect 
to Xi. Consequently, P3 lies within the framework of successive 
approximations required to solve each per-iteration subproblem. 
Following the same relaxations and invoking the successive upper 
bound approximations described earlier, yields the following QCQP 

x^ = argmin (TV + Ad^i — l)xf x; 

{xi:||xi||2</(ci)} 

-xr[E((i + A«..)x; + <5..g-(xr')) 

+ E ((1 + Aa.,)x^^-i + 5,,^]-\^r^))] (17) 

JV 

with da :— aij denoting the degree of node i. 
The solution of ( |17b can be expressed as [cf. ( I14H 

(x*)'' = jv+Afc[E((l + Aa,,)xJ + 5.,g;(xr')) 

4- E ((1 + Aa,,)x;-^ + 5,,g;-Hx[-i))l. (18) 

j>i 

With A given. Algorithm |2] summarizes the steps to determine the 
constrained embedding with a smoothness penalty. 



Algorithm 2 Incorporating smoothness in Algorithm 1 

Input: A, {ci},'li, «' A 
Initialize X", r = 
repeat 

r — r + I 

for i = 1 ... TV do 

Compute Xi according to ( 114b and dlSt 
X'■(^,:) = (x^^ 

end for 
until IIX'' -X'-^||f < e 
X = (I-ill^)X'- 



5. NUMERICAL EXPERIMENTS 
5.1. Visualizing the London Tiibe 

In the first experiment, an undirected graph of 307 nodes represent- 
ing the London tube, an underground train transit networkjj is con- 
sidered. The nodes represent stations whereas the edges represent 
the routes connecting them. The objective is to generate an embed- 
ding in which stations traversed by most routes are placed closer to 
the center, thus highlighting their relative significance in metro tran- 
sit. Such information is best captured by the betweenness centmlity, 
which is defined as a ■- 'Ej^k^iev ^'j,k/ (Eig v '^i.fc)' wh^re cr] ^ 
is the number of shortest paths between nodes j and k through node 



https://wikis.bris.ac.uk/dispIay/ipshe/London-l-Tube 




Fig. 1. Centrality-constrained embedding of the London tube 




Fig. 2. MDS stress iterations 
I 1101 . The centrality values were transformed as follows: 



diam (Q) 



Ci 



1 - 



mm Ci 



max Ci 



max Ci 

iGV 



(19) 



with diam(C/) denoting the diameter of Q. Simulations were run 
for several values of A starting with A = 0, and the resultant two- 
dimensional (p = 2) embeddings were plotted. Figure [T| depicts 
the optimal embedding obtained without a smoothness penalty. The 
color grading reflects the centrality levels of the nodes from highest 
(yellow) to lowest (red). Algorithm 1 converged after approximately 
150 outer iterations as shown in Figure |2] 

Figure |3] illustrates the effect of including the smoothness 
penalty. Increasing A promotes embeddings in which edge crossings 
are minimized. This intuitively makes sense because by forcing 
single-hop neighbors to lie close to each other, the average edge 
length decreases, leading to fewer edge crossings. In addition, 
increasing A yielded embeddings that were aesthetically more ap- 
pealing under fewer iterations. For instance, setting A = 10, 000 
required only 30 iterations for a visualization that is comparable to 
running 150 iterations with A = 0. An application of this work is 
travel time cartography in which edge lengths reflect the amount 
of time it takes to travel between stations. In this case, f{ci) is 
equivalent to the transit time from a station of interest to any other 
station i. By selecting a station and specifying the travel time to 
all other nodes, an informative radial map centered at the station of 
interest can be generated. 

5.2. Collaboration network of Arxiv General Relativity 

In this experiment, a large social network is considered from the 
e-print arXiv repository covering scientific collaborations between 
authors on papers submitted to the "General Relativity and Quantum 
Cosmology" category (January 1993 to April 2003) 19J. The nodes 
represent authors and an edge exists between nodes i and j if authors 
i and j co-authored a paper. Although the network contains 5, 242 
nodes, the embedding considered only its largest strongly connected 





(d) A = 10, 000 
Fig. 3. Visualizing the London tube with a smoothing penalty 




Fig. 4. Embedding of a large social network 

component comprising 4, 158 nodes. The objective was to embed 
the network so that authors whose research is most related to the 
majority of the others are placed closer to the center. 

This behavior is best captured by the closeness centrality that is 
defined as Ci :— (X]_,gv '^^i)^^' where dij is the geodesic distance 
(lowest sum of edge weights) between nodes i and j; and captures 
the extent to which any node lies close to all other nodes 1 1 p. 88]. 
An informative mapping was obtained within 30 outer iterations. For 
clarity and emphasis of the node positions, edges were not included 
in the visualization. Drawings of graphs as large as the autonomous 
systems within the Intemet typically thin out most of the edges. 

Figure|4]shows the embedding with color coding reflecting vari- 
ations in centrality measure. The proposed approach based on first- 
order methods leads to a fast algorithm for visualizing such large 
networks. 

6. CONCLUSIONS 

In this work, MDS-based means of embedding graphs with certain 
structural constraints were proposed. In particular, an optimization 
problem was formulated under centrality constraints that are used 
to capture relative levels of importance between the nodes. A block 
coordinate descent solver with successive approximations was devel- 
oped to deal with the non-convexity and non-smoothness of the con- 
strained MDS stress minimization problem. In addition, a smooth- 
ness penalty term was incoiporated to minimize the edge crossings 
in the resultant network visualizations. Tests on real-world networks 
were run and the results demonstrated that convergence is guaran- 
teed, and large networks can be visualized relatively fast. 
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